An Experiment in Spanish-Portuguese Statistical Machine Translation

نویسندگان

  • Wilker Ferreira Aziz
  • Thiago Alexandre Salgueiro Pardo
  • Ivandré Paraboni
چکیده

Statistical approaches to machine translation have long been successfully applied to a number of ‘distant’ language pairs such as English-Arabic and English-Chinese. In this work we describe an experiment in statistical machine translation between two ‘related’ languages: European Spanish and Brazilian Portuguese. Preliminary results suggest not only that statistical approaches are comparable to a rule-based system, but also that they are more adaptive and take considerably less effort to be developed.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Statistical Phrase-based Machine Translation: Experiments with Brazilian Portuguese

Statistical approaches have recently emerged as the main paradigm in Machine Translation (MT) research. In previous work we have shown that results of a simple statistical word-based MT system may be highly comparable to those produced by a rule-based approach for closely-related languages such as Brazilian Portuguese and European Spanish. In this work we take the discussion one step further an...

متن کامل

Statistical Machine Translation of Broadcast News from Spanish to Portuguese

In this paper we describe the work carried out to develop an automatic system for translation of broadcast news from Spanish to Portuguese. Two challenging topics of speech and language processing were involved: Automatic Speech Recognition (ASR) of the Spanish News and Statistical Machine Translation (SMT) of the results to the Portuguese language. ASR of broadcast news is based on the AUDIMUS...

متن کامل

Fully Automatic Compilation of Portuguese-English and Portuguese-Spanish Parallel Corpora

This paper reports the fully automatic compilation of parallel corpora for Brazilian Portuguese. Scientific news texts available in Brazilian Portuguese, English and Spanish are automatically crawled from a multilingual Brazilian magazine. The texts are then automatically aligned at documentand sentence-level. The resulting corpora contain about 2,700 parallel documents totaling over 150,000 al...

متن کامل

The Scielo Corpus: a Parallel Corpus of Scientific Publications for Biomedicine

The biomedical scientific literature is a rich source of information not only in the English language, for which it is more abundant, but also in other languages, such as Portuguese, Spanish and French. We present the first freely available parallel corpus of scientific publications for the biomedical domain. Documents from the ”Biological Sciences” and ”Health Sciences” categories were retriev...

متن کامل

Improved Statistical Machine Translation for Resource-Poor Languages Using Related Resource-Rich Languages

We propose a novel language-independent approach for improving statistical machine translation for resource-poor languages by exploiting their similarity to resource-rich ones. More precisely, we improve the translation from a resourcepoor source language X1 into a resourcerich language Y given a bi-text containing a limited number of parallel sentences for X1-Y and a larger bi-text for X2-Y fo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008